Robust Speech-Annotated Photo Retrieval Using Syllable-Transformed Patterns

نویسندگان

  • Chien-Lin Huang
  • Wei-Chuan Lee
  • Chung-Hsien Wu
چکیده

This study presents a robust indexing and retrieval scheme for digital photos with speech annotations based on the syllable-transformed patterns. In speech retrieval application, out-of-vocabulary and recognition error problems are generally prone to incorrect transcription and therefore degrade the retrieval performance. In this study, the recognized n-best syllable candidates for each syllable is regarded as an ordered pattern and converted into an “image-like” pattern using the multidimensional scaling (MDS) method for indexing and retrieval. Vector quantization is then applied to cluster image vectors into the indexing codeword. Finally, a VSM-based indexing mechanism is used for photo retrieval with speech query. Experiments were conducted on the speech annotations of 1,055 collected digital photos. Compared to other conventional methods, the syllable-transformed pattern method shows a promising improvement on speech-annotated photo retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of syllable structure in Azeri-speaking children

Introduction: the length and complexity of syllable structure in the utterances of the children increases with age.Given the important and determining role of syllable in the speech process, performance of developmental studies on syllable acquisition in children are essential. The aim of the present study was to investigate the development and acquisition of syllable structure and the distribu...

متن کامل

Syllable timing patterns in Polish: results from annotation mining

Previous studies of duration variation in syllable constituents have yielded results for Polish which are clear outliers in relation to those for other languages. We report on a study of this issue in the context of TTS development, using a large annotated database. Global and local duration distance measures are applied to phoneme and syllable level units, and generalised iambic and trochaic d...

متن کامل

A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-recognition of spoken words. Consequently, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition. To solve the problem of OOV keywords, we use indiv...

متن کامل

Word segmentation in Persian continuous speech using F0 contour

Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...

متن کامل

Robust Photo Retrieval using World Semantics

Photos annotated with textual keywords can be thought of as resembling documents, and querying for photos by keywords is akin to the information retrieval done by search engines. A common approach to making IR more robust involves query expansion using a thesaurus or other lexical resource. The chief limitation is that keyword expansions tend to operate on a word level, and expanded keywords ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006